Boost-R: Gradient boosted trees for recurrence data

نویسندگان

چکیده

Recurrence data arise from multi-disciplinary domains spanning reliability, cyber security, healthcare, online retailing, etc. This paper investigates an additive-tree-based approach, known as Boost-R (Boosting for Data), recurrent event with both static and dynamic features. constructs ensemble of gradient boosted additive trees to estimate the cumulative intensity function process, where a new tree is added by minimizing regularized L2 distance between observed predicted intensity. Unlike conventional regression trees, time-dependent constructed on each leaf. The sum these functions, multiple yields estimator divide-and-conquer nature tree-based methods appealing when hidden sub-populations exist within heterogeneous population. non-parametric helps avoid parametric assumptions complex interactions processes Critical insights advantages are investigated through comprehensive numerical examples. Datasets computer code made available GitHub. To our best knowledge, first approach modeling large-scale feature information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding Influential Training Samples for Gradient Boosted Decision Trees

We address the problem of finding influential training samples for a particular case of tree ensemble-based models, e.g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying how the model’s predictions change upon leave-one-out retraining, leaving out each individual training sample. Recent work has shown that, for parametric model...

متن کامل

Gradient Boosted Decision Trees for High Dimensional Sparse Output

In this paper, we study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse. For example, in multilabel classification, the output space is a L-dimensional 0/1 vector, where L is number of labels that can grow to millions and beyond in many modern applications. We show that vanilla GBDT can easily run out of memory or encounter near-forever running ti...

متن کامل

GB-CENT: Gradient Boosted Categorical Embedding and Numerical Trees

Latent factor models and decision tree based models are widely used in tasks of prediction, ranking and recommendation. Latent factor models have the advantage of interpreting categorical features by a low-dimensional representation, while such an interpretation does not naturally fit numerical features. In contrast, decision tree based models enjoy the advantage of capturing the nonlinear inte...

متن کامل

Web-Search Ranking with Initialized Gradient Boosted Regression Trees

In May 2010 Yahoo! Inc. hosted the Learning to Rank Challenge. This paper summarizes the approach by the highly placed team Washington University in St. Louis. We investigate Random Forests (RF) as a low-cost alternative algorithm to Gradient Boosted Regression Trees (GBRT) (the de facto standard of web-search ranking). We demonstrate that it yields surprisingly accurate ranking results — compa...

متن کامل

Optimization with Gradient-Boosted Trees and Risk Control

Decision trees effectively represent the sparse, high dimensional and noisy nature of chemical data from experiments. Having learned a function from this data, we may want to thereafter optimize the function, e.g., picking the best chemical process catalyst. In this way, we may repurpose legacy predictive models. This work studies a large-scale, industrially-relevant mixed-integer quadratic opt...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Quality Technology

سال: 2021

ISSN: ['2575-6230', '0022-4065']

DOI: https://doi.org/10.1080/00224065.2021.1948373